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Abstract 

The olfactory receptor (OR) genes represent the largest multigene family in the genome of terrestrial vertebrates. Here, the 
high-throughput next-generation sequencing (NGS) approach was applied to characterization of OR gene repertoires in the 
green anole lizard Anolis carolinensis and the Japanese four-lined ratsnake Elaphe quadrivirgata. Tagged polymerase chain 
reaction (PCR) products amplified from either genomic DNA or cDNA of the two species were used for parallel 
pyrosequencing, assembling, and screening for errors in PCR and pyrosequencing. Starting from the lizard genomic DNA, we 
accurately identified 56 of 136 OR genes that were identified from its draft genome sequence. These recovered genes were 
broadly distributed in the phylogenetic tree of vertebrate OR genes without severe biases toward particular OR families. 
Ninety-six OR genes were identified from the ratsnake genomic DNA, implying that the snake has more OR gene loci than 
the anole lizard in response to an increased need for the acuity of olfaction. This view is supported by the estimated number 
of OR genes in the Burmese python's draft genome (—280), although squamates may generally have fewer OR genes than 
terrestrial mammals and amphibians. The OR gene repertoire of the python seems unique in that many class I OR genes are 
retained. The NGS approach also allowed us to identify candidates of highly expressed and silent OR gene copies in the 
lizard's olfactory epithelium. The approach will facilitate efficient and parallel characterization of considerable unbiased 
proportions of multigene family members and their transcripts from nonmodel organisms. 
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Introduction 

Natural environments are filled with various odors. These 
odors are rich in information, and thus, most animals have 
evolved an acute sense of smell to detect and interpret 
them. In vertebrates, odor chemicals are mainly detected 
by olfactory receptors (ORs) that are expressed in the olfac- 
tory sensory neurons (reviewed in Mombaerts 2004). To dis- 
criminate vast numbers of odor chemicals, the number of 
vertebrate ORs is highly increased hundreds or thousands 



of intact OR genes being found in one species (reviewed 
in Nei et al. 2008). OR genes thus represent the largest 
multigene family in the genome of terrestrial vertebrates. 
Discrimination of odor chemicals is based on the "combina- 
torial coding" manner, in which most odorants are identified 
not by the activation of a single OR but by the activation 
pattern of multiple ORs (Su et al. 2009). 

Previous studies have suggested that acuity of olfaction in 
vertebrates is reflected by the copy number of functional OR 
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genes and/or the percentage of pseudogenes within a spe- 
cies (Gilad et al. 2004; Kishida et al. 2007; Steiger et al. 
2008, 2009; Hayden et al. 2010). Therefore, comparative 
study of OR diversity among ecologically divergent species 
may provide significant insights into adaptive evolution of 
odor perception. However, identifying individual members 
of the OR gene repertoire in a species by subcloning and 
Sanger sequencing strategy is very difficult because of 
the large number and sequence diversity of the OR genes. 
At present, the best and the only way to obtain nearly 
complete OR gene repertoire is in silico screening of the 
whole-genome sequence, but genomic databases are 
available only for a mere handful of model organisms. 

In vertebrate groups in which genomic data have been 
published for multiple species (i.e., mammals, birds, and 
teleost fishes), copy numbers of the OR genes are highly var- 
iable between species (Alioto and Ngai 2005; Niimura and 
Nei 2007; Steiger et al. 2008). In April 2009 when we started 
the present study, reptilian draft genome sequences were 
available only for the green anole lizard Anolis carolinensis, 
and thus, variation of the OR copy number among reptilian 
taxa was not known. The number of OR genes estimated 
from the anole lizard draft genome was smaller than those 
identified for other vertebrate groups (Niimura 2009; 
Steiger et al. 2009; Kishida and Hikida 2010). However, 
the lower number of OR genes in the anole lizard may 
not be representative of reptiles because many reptilian 
species possess highly developed sense of smell (Pianka 
and Vitt 2003; Vitt et al. 2003). To understand the evolution 
of olfactory ability in reptiles, it seems crucial to investigate 
the OR gene repertoire for organisms other than A. caroli- 
nensis, although studies of reptilian OR genes are very 
limited (e.g., Kishida et al. 2007; Kishida and Hikida 2010). 

High-throughput next-generation sequencing (NGS) is 
rapidly changing methodologies of molecular genetics stud- 
ies (Mardis 2007). Recent development of Roche GS FLX 
Titanium DNA sequencing technology enables one to se- 
quence numerous DNA fragments of more than 400 bp 
in average size (~1 kbp with the latest specification in Feb- 
ruary 2012) without the vector-based cloning that tends to 
introduce a bias in cloned sequences (http://454.com/ 
products-solutions/454-sequencing-system-portfolio.asp). 
Furthermore, this method can potentially discriminate poly- 
merase chain reaction (PCR) errors from true sequences 
by sequencing the same DNA regions multiple times. These 
advantages can make the FLX-based NGS approach suitable 
for characterizing large multigene families, such as the ver- 
tebrate OR gene family. Indeed, this approach has been 
shown to be effective in characterizing the polymorphic 
multilocus MHC system (Babik et al. 2009). The OR genes, 
however, form a more complicated multigene family than 
the MHC genes, and the accuracy and efficiency of the 
NGS approach for investigating the vertebrate OR genes 
need to be evaluated thoroughly. 



In the present study, we first attempted to assess the use- 
fulness of the NGS approach for experimental identification 
of OR genes in the anole lizard, which can be evaluated 
based on the in silico identified OR gene repertoire from 
its draft genome sequence. We show that this approach 
can provide a reliable view of the lizard's OR gene repertoire 
by recovering a considerable proportion of OR genes 
encoded in its genome. We then applied this approach to 
characterization of the OR gene repertoire in the Japanese 
four-lined ratsnake Elaphe quadrivirgata. The ratsnake and 
the Burmese python being the second reptilian taxa with 
a new draft genome sequence (Castoe et al. 2011) are 
known to have developed a life style that is highly depen- 
dent on the olfaction, whereas Anolis lizards are believed to 
rely on the visual sense for the prey capture and the escape 
from predators (Pianka and Vitt 2003). Thus, comparison of 
the OR gene repertoire between the snakes and the anole 
lizard may provide insights into molecular evolution of the 
olfactory genes in squamate reptiles. 

Materials and Methods 

Identification of OR Genes from the Anole Lizard and 
the Python Genome Assembly 

We examined the draft genomic sequences of the green 
anole lizard (AnoCar2.0, May 2010; http://www.ensembl. 
org/Anolis_carolinensis/lnfo/lndex/; Alfdldi et al. 2011) 
and the Burmese python (GenBank ID: AEQUOOOOOOOOO; 
Castoe et al. 201 1) to identify the nearly complete OR gene 
repertoire in each species. OR sequences were identified by 
a method that was used to find fish vomeronasal-type ORs 
(Hashiguchi and Nishida 2006) with slight modifications. 
First, a TBIastN search was conducted with the cutoff E value 
of 10~ 10 against the genomic data using several represen- 
tative vertebrate OR amino acid sequences as queries. 
Obtained sequences were verified as ORs by BlastP searches 
against NCBI nonredundant(nr) database. Next, each region 
of Blast similarity was extended to 1 kb in 5' and 3' direc- 
tions to predict OR-coding sequences. For each of these ge- 
nomic regions, intronless OR-coding sequences were 
estimated by the profile hidden Markov model (profile 
HMM)-based gene prediction with the program WISE2 
(Birney et al. 2004). A profile HMM was constructed from 
the alignment of known OR sequences from human, frog, 
and fish using the HMMER software package (http:// 
hmmer.janelia.org). Positions of initiation and stop codons 
of the obtained OR-coding sequences were identified 
manually. 

The anole lizard putative OR sequences were classified 
into two groups, apparently functional genes and nonfunc- 
tional pseudogenes. If a sequence contained any disruptive 
frameshift and/or stop codon, it was considered as a pseu- 
dogene. In this study, partial sequences (less than 600 bp) 
were also classified as pseudogenes, although some partial 
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Table 1 

Primers Used in This Study 





Species 


1 CI 1 1 fj 1 d I c 


Primpr ^pm ipnrp 

It IIIICI JCUUCIILC 


AcORg_F1 


Anolis carolinensis 


Genomic DNA 


GGGCTCTGAGATGGCATATGAYCGVTAYKTKGC 


AcORg_R1 


A. carolinensis 


Genomic DNA 


GGGCTGTCAGGAACAGGTRGARAARGCYTT 


AcORm_F1 


A. carolinensis 


Nose cDNA 


GGGCTCGTAGATGGCATATGAYCGVTAYKTKGC 


AcORm_R1 


A. carolinensis 


Nose cDNA 


GGGCTGTACGGAACAGGTRGARAARGCYTT 


EqORg_F1 


Elaphe quadrivirgata 


Genomic DNA 


GGGCTCGATGATGGCATATGAYCGVTAYKTKGC 


EqORg_R1 


E. quadrivirgata 


Genomic DNA 


GGGCTGCTAGGAACAGGTRGARAARGCYTT 


F2 


A. carolinensis 


Genomic DNA 


ATGGCATATGAYCGVTAYNTDGC 


R2 


A. carolinensis 


Genomic DNA 


GAACAGGTDGARARWGYYTT 


F3 


A. carolinensis 


Genomic DNA 


CATATGAYCGVTAYKTDGCYATHTG 


R3 


A. carolinensis 


Genomic DNA 


CAGTAARRTGGGARSHRCADGTDGA 



Note. — The first six primers were used for the NGS experiments, whereas the other four primers were used for manual sequencing of clones for amplified products. Tag sequences 
are indicated by underlines. Note that three forward primers ending in F1 share identical sequences after the tag sequences (the F1 primer sequence) and that three reverse primers 
ending in R1 do so (the R1 primer sequence). F1-F3 primers are forward primers, and R1-R3 primers are reverse ones. F2 and R2 primers are designed in the same location as F1 and 
R1 primers, respectively, but have slightly different bases in some positions. F3 and R3 primers are designed in different locations, but the amplified F3-R3 region largely overlaps with 
the F1-R1 (= F2-R2) region. 



genes may have resulted from incomplete genome assem- 
bly. The python putative OR sequences were classified into 
three groups, functional genes, pseudogenes, and trun- 
cated (partial) genes, because the python genome con- 
tained many truncated OR copies found in very short 
(~2 kb) contigs. In the python, all nondisrupted OR sequen- 
ces of less than 800 bp in length were classified as truncated 
genes. Each OR sequence identified was searched against 
the HORDE (the Human Olfactory Data Explorer) #42 data- 
base (Olender et al. 2004, http://genome.weizmann.ac.il/ 
horde/) using the FASTA search and classified into the 
same family as the best-hit human OR sequence. Our family 
classification followed that by Glusman et al. (2000). 

Sample Collection and DNA/RNA Extraction 

Genomic DNA of the green anole lizard was extracted using 
a DNeasy Tissue Kit (QIAGEN) from muscle tissues of a young 
dead individual obtained from a pet shop in 2002. Genomic 
DNA of the Japanese four-lined ratsnake was similarly 
extracted from blood samples of a female individual caught 
in Shiga Prefecture, Japan through the courtesy of 
Dr Michihisa Toriba. Total RNA of the anole lizard was 
extracted from a male individual that was captured at 
Chichi-jima Island, Ogasawara, Japan in 2008 with per- 
mission from the Ministry of the Environment. An upper 
jaw portion containing both nasal and vomeronasal parts 
of olfactory organs were excised immediately after killing 
the animal and cut into 5 mm pieces (see supplementary 
fig. 1, Supplementary Material online that illustrates the 
excised portion and provides evidence that it covers these 
organs). In this study, we were unable to excise nasal and 
vomeronasal parts separately from each other. 

Cells were disrupted in Lysing Matrix D tubes for 30 s at 
the 6.5 m/s power with Fastprep-24 instrument (MP 
Biomedicals), from which total RNA was extracted using 
a mirVana miRNA Isolation Kit (Ambion) according to the 



manufacture's instruction. After the treatment of resultant 
RNA samples with TURBO DNase free (Ambion) for 1 h at 
37 °C to degrade possibly remaining DNA fractions com- 
pletely (see supplementary fig. 2, Supplementary Material 
online), reverse transcription reaction was carried out using 
a High-Capacity cDNA Reverse Transcription Kit (Applied 
Biosystems) and the random hexamers primer, following 
the manufacturer's protocol. Incubation time was 10 min 
at 25 °C, 2 h at 37 °C, and finally 5 s at 85 °C for inactivating 
the reverse transcriptase. 

PCR Amplification and the High-Throughput Sequencing 

To amplify OR sequences of the anole lizard and the 
ratsnake, degenerate primers were designed within con- 
served regions among tetrapod OR genes. Known OR genes 
from the anole lizard, human, and frog (Xenopus tropicalis) 
were mainly used for this purpose. Forward and reverse pri- 
mers were designed in the third transmembrane region and 
the fourth intracellular region of OR genes, respectively. 
Expected length of PCR products with these primers was 
—331 bp long. For amplification from the anole lizard ge- 
nomic DNA, AcORg_F1 was used as a forward primer 
and AcORg_R1 as a reverse primer (table 1). Each primer 
started with GGGC followed by a 6-bp tag for identifying 
species and PCR templates (genomic DNA or cDNA). GGGC 
tetranucleotide at the 5'-end of primers was used to elim- 
inate the effect of the 5'-terminal nucleotide on the tag 
efficiency (Binladen et al. 2007; Valentini et al. 2009). To 
discriminate three PCR reactions with different templates 
(the anole lizard genomic DNA, the anole lizard cDNA, 
and the ratsnake genomic DNA) and forward/reverse 
strands of each reaction, we used six different tag sequences 
(table 1). 

PCR was performed in a 10 \x\ reaction mixture using 
a PrimeSTAR HS DNA polymerase (Takara Bio), 0.5 u.M each 
primer, and template DNA either from diluted genomic 
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Table 2 

Summary of Obtained OR Sequences by the Next-Generation Sequencing 



Species 


Template 


FLX Reads 
(contigs) 3 


OR-Related Reads 
(contigs) b 


OR-Coding Reads 
(contigs) c 


DDBJ Accession 
Number 


Specimen Voucher 
Number d 


Anolis carolinensis 
A. carolinensis 
Elaphe quadrivirgata 


Genomic DNA 
Nose cDNA 
Genomic DNA 


6196 (195) 
6854 (194) 
6202 (253) 


5182 (71) 
6329 (70) 
5831 (140) 


5043 (56) 
5966 (40) 
5505 (96) 


AB646799-AB646854 
FX180060-FX 180099 
AB646855-AB646950 


SDNCU-A0007 
SDNCU-A0008 



a The number of reads (initially assembled contigs in parentheses) that had the corresponding tag sequences. 

b The number of reads that constituted the OR-related contigs after excluding non-OR sequences and the OR contigs with <5x coverages as well as unifying sequences with 
<1 % sequence divergences (see text). 

c The number of reads that constituted putatively legitimate OR-coding sequences after excluding chimeras for the lizard sequences and after excluding contigs with <1 1 x 
coverages for the ratsnake sequences (see text). Database accession numbers for the resultant contig sequences are also given in the next column. 
d Whole body frozen specimens are deposited to the Specimen Depository of the Graduate School of Natural Sciences, Nagoya City University. 



DNA or randomly reverse-transcribed cDNA. PCR reaction 
cycle scheme was 98 °C for 30 s, followed by 28 cycles 
of 98 °C for 10 s, 50 °C for 15 s, and 72 °C for 30 s. 
PCR products were electrophoresed in a 1% agarose gel 
and purified using a MinElute Gel Extraction Kit (QIAGEN). 
Concentrations of PCR products were measured using 
NanoVue Plus Spectrophotometer (GE Healthcare). Equal 
amounts (150 ng) of individual amplicons were then 
pooled from the three sources (i.e., the lizard genomic 
DNA, the lizard cDNA, and the ratsnake genomic DNA) 
and sequenced as a part of single GS FLX Titanium Genome 
Analyzer (Roche) sequencing run. This was conducted as 
an outsourcing service by Takara Bio, Inc. Raw reads data 
obtained by the FLX sequencing have been deposited 
to the Read Archive at DDBJ with accession numbers 
DRA000409-DRA000411. 

Assembling the NGS Data 

First, reads obtained from the FLX sequencing were divided 
into the three groups on the basis of the sequence tags (see 
table 1 ). Second, these reads were assembled into contigs by 
Sequencher version 4.8 (Gene Codes) with the default 
setting. Contigs that consist of less than five FLX reads 
(hereby designated <5x coverage), and all singletons were 
excluded from the data set because they are more likely af- 
fected by PCR errors and chimeras than contigs with denser 
coverages (see below). Note that the read number in a contig 
is equivalent to the read coverage per site because most FLX 
reads span the amplified region. Contigs with less than 1 % 
nucleotide differences were considered as the same se- 
quence, which originated from alleles of the same locus 
or PCR errors. Under this criterion, almost all OR gene se- 
quences identified in the anole lizard genome can be recog- 
nized as separate genes (data not shown). Consensus 
sequences of each resultant contig were queried by a BlastX 
search against NCBI nr database in order to verify that they 
are really OR gene members. If the BlastX best hit was a pre- 
viously known OR, it was considered a putative OR-coding 
sequence. Each OR-coding sequence thus identified was 
queried against the HORDE #42 Database using the FASTA 



search and classified into the same family as the best-hit hu- 
man OR sequence. These OR-coding sequences obtained in 
this study have been deposited to the DDBJ/EMBL/NCBI Nu- 
cleotide Sequence Database with accession numbers shown 
in table 2. 

Assessment of OR Sequences Obtained by the NGS 
Approach 

It is generally known that high-throughput NGS methods 
are affected by higher error rates than the traditional Sanger 
sequencing method, depending on different sequence con- 
texts (Moore et al. 2006). Under- or overcalls of homopol- 
ymer runs are typical errors in pyrosequencing with Roche 
GS FLX Titanium Genome Sequencer (Margulies et al. 2005; 
Moore et al. 2006). In addition, generation of sequence chi- 
meras by PCR amplification also cannot be ignored for PCR- 
based cloning and sequencing of multilocus genes, such as 
the OR gene family. To assess the validity of putative OR 
gene sequences obtained by the GS FLX sequencing (desig- 
nated FLX-based OR sequences), we corresponded each 
FLX-based OR sequence from the anole lizard to the OR 
gene sequences identified in its draft genome (designated 
DB-based OR sequences) using the FASTA search. Possible 
PCR-mediated recombination errors as well as homopoly- 
mer run-associated sequencing errors were picked up man- 
ually by checking the pairwise alignments resulting from the 
FASTA search. 

Phylogenetic Analyses 

Phylogenetic trees containing the ratsnake OR sequences 
were constructed using two different data sets. One data 
set consists of human (Niimura and Nei 2005), chicken 
(Niimura 2009), the anole lizard, and the ratsnake ORs. 
The other data set consists of OR sequences from the 
ratsnake, the python, and 14 reptilian taxa (Kishida and 
Hikida 201 0). Only the FLX-based OR sequences that consist 
of more than ten reads (i.e., >10 coverage) were used for 
the ratsnake (see Results). In each data set, deduced amino 
acid sequences were aligned using MAFFT program (Katoh 
et al. 2002), and the alignment was finally inspected and 
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A. DB-based OR sequences B. FLX-based OR sequences C. FLX-based OR sequences 

(the anole lizard genome assembly) (the anole lizard genomic DNA) (the anole lizard cDNA) 

51 2 

12 14 



14 



13 
12 



10 

Wn I 6 



D. FLX-based OR sequences 
(the ratsnake genomic DNA) 

13 






t DB-based OR sequences 
(the python genome assembly) 



Class I 




Fig. 1. — Relative gene composition of OR families (HORDE classification) identified from the anole lizard and the ratsnake. (A) ORs obtained from 
the anole lizard genome assembly. (6) FLX-based OR sequences from the anole lizard genomic DNA. (0 FLX-based OR sequences from the anole lizard 
nose cDNA. (D) FLX-based OR sequences from the ratsnake genomic DNA. (F) DB-based OR sequences from the python genome assembly. 



corrected by eye. Phylogenetic trees were constructed by 
the neighbor joining method with matrices of the Poisson- 
corrected amino acid distances, using MEGA 4 software 
package (Tamura et al. 2007). The reliability of each nodal 
relationship was assessed by 1000 bootstrap replications. 



Results 

Repertoire of OR Genes Identified from the Anole Lizard 
and Python Genome Assembly 

By a comprehensive data mining approach, we identified 
1 08 putatively functional and 28 disrupted and/or truncated 
(<600 bp) OR gene sequences from the anole lizard ge- 
nome assembly. All these DB-based OR sequences were in- 
tronless and most of them were tightly clustered within 
several chromosomal or scaffold regions (see supplementary 
table 1 , Supplementary Material online). The composition of 
HORDE families in the lizard OR gene repertoire is shown in 
figure 1 A All the anole lizard OR genes except one belonged 
to class II. One class I OR gene was classified into family 51 . 
Class II OR genes assigned to the same HORDE families were 
generally clustered together in a phylogenetic tree (see sup- 
plementary fig. 3, Supplementary Material online). 

From the python genome assembly, we identified 1 53 
functional, 13 disrupted, and 114 truncated OR gene 
sequences (see supplementary table 2, Supplementary 
Material online), although this gene repertoire may be 
somewhat incomplete owing to the lower coverage of 



the python genome (ca. 17x coverage from lllumina 
paired-end sequences; Castoe et al. 201 1). All these OR se- 
quences were intronless. Chromosomal positions of these 
OR sequences are unknown because the contigs of python 
genome are very short (typically ~2 kbp) and unconnected. 
The composition of HORDE families in the python OR gene 
repertoire is shown in figure 1F. Unlike the anole lizard, 17 
class I OR genes were identified in the python genome. In 
these class I OR genes, four genes were classified into family 
51, 12 were family 52, and 1 was family 55. 

The Anole Lizard OR Sequences Obtained by the NGS 
Approach 

Using the NGS approach, we obtained 6, 1 96 reads from the 
anole lizard genomic DNA and 6,854 reads from its nose 
cDNA (table 2). The initial assembling gave rise to 195 
and 1 94 contigs from genomic DNA and cDNA, respectively. 
After excluding non-OR sequences and OR sequences with 
<5x coverages as well as unifying possibly identical se- 
quence contigs (i.e., <1 % pairwise nucleotide differences: 
see Materials and Methods), 71 (genomic DNA) and 70 
(cDNA) distinct contigs remained. Fifteen (genomic DNA) 
and 30 (cDNA) artificial chimeric sequences were addition- 
ally detected by FASTA searches against the database se- 
quences. Excluding the chimeras from the data set, 56 
and 40 distinct OR sequences were finally identified from 
the lizard genomic DNA and cDNA, respectively (table 2). 
These sequences were found to contain ten (genomic 
DNA) and three (cDNA) homopolymer run-associated errors 
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A. The anole lizard genomic DNA 

5-1 00X 



Error (PCR recombination) 
■ Error (homopolymer runs) 
3 Correct sequences 




>101X 




B. The anole lizard nose cDNA 




C . The ratsnake genomic DNA 

5-100X >1Q1X 




\° <£> <$> sS> «?> <<? <\? 9? <£> 101-200 201-300 300-400 >400 

V' t\* q> ^ ^ & fe N ^ % N 



Fig. 2. — Histograms of the coverages or read numbers for each FLX-based contig sequence: {A) the anole lizard genomic DNA, (6) the anole lizard 
nose cDNA, and (O the ratsnake genomic DNA. 



(see supplementary tables 3 and 4, Supplementary Material 
online). 

Figure 2 shows histograms of the read coverage for the 
FLX-based OR sequences. For the sequences obtained from 
the lizard genomic DNA, most erroneous sequences had 
<11x coverages (fig. 24), whereas 15 of 30 chimeras 
had more than 10x coverages for the sequences obtained 
from the cDNA (fig. 2B). The higher rate of chimeras in 
cDNA-originated OR sequences could possibly be attribut- 
able to errors during the reverse transcription reaction. Thus, 
when we applied this approach to the characterization of 
the ratsnake OR genes amplified from its genomic DNA, 
contigs of <1 1 x coverages were considered to be possibly 
incorrect and thus discarded. 

The composition of HORDE families identified in the 
lizard's FLX-based OR sequences was very similar to that 



identified in the DB-based OR sequences (fig. 1/4 and B). 
We found that approximately one-third of the FLX-based 
OR sequences, when queried with their full-length OR gene 
sequences, was assigned to a different but neighboring 
HORDE family. However, the HORDE family distribution 
based on 56 full-length OR genes had no noticeable differ- 
ence from that from 56 partial OR gene sequences (data not 
shown). This indicates that the FLX-based sequencing 
method used in this study can cover almost all families of 
OR sequences, at least for the anole lizard but possibly 
for other species as well, even though all gene members 
of these families were not picked up. 

On the other hand, HORDE family composition of OR se- 
quences obtained from the genomic DNA was slightly dif- 
ferent from that obtained from the cDNA (fig. 1 B and O- For 
instance, OR sequences of families 2 and 8 were found only 
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from the genomic DNA. Proportion of family 5 OR sequen- 
ces in the latter (15/40 x 100 = 38%) was slightly larger 
than that in the former (1 5/56 x 1 00 = 27%), although this 
difference was not significant (P = 0.528, Fisher's exact 
test). These differences likely reflect mRNA expression levels 
of different OR genes in the lizard's olfactory epithelium. 

Expression of the Anole Lizard OR Genes 

Coverage numbers for each FLX-based OR sequence ob- 
tained from nose cDNA seem to reflect primarily its expres- 
sion level (i.e., abundance of the corresponding mRNA). 
However, they are also highly dependent on other factors, 
such as the efficiency of PCR amplification in relation to, 
for example, the matching of primers to individual gene 
copy sequences. On the other hand, coverage numbers 
for each FLX-based OR sequence obtained from genomic 
DNA are considered to reflect only the latter part because 
the copy number for each gene is equal in the genomic 
DNA. Thus, relative expression level of each OR-coding 
sequence may be roughly estimated by comparing its cov- 
erage number in cDNA-originated OR sequences with that 
in genomic DNA-originated sequences, provided that sim- 
ilar numbers of OR-coding reads are obtained from the 
two sources (table 2). If the former coverage number is 
significantly higher than the latter number, the corre- 
sponding OR gene copy may be considered highly ex- 
pressed. In a reverse situation, its expression level may 
be considered relatively low. 

Figure 3 shows comparison of the coverage numbers 
between the lizard genomic DNA-originated and cDNA- 
originated OR sequences. Because total numbers of OR- 
coding reads were similar between the two sources 
(5,043 of genomic DNA origin and 5,966 of cDNA origin; 
see table 2), the direct comparison of coverage numbers 
may provide us with information of the expression level for 
each gene. For four OR sequences, Ac13 (HORDE family 
4), Ac31 (family 1), Ac49 (family 14), and Ac129 (family 
9), coverages of their cDNA-originated sequences were 
more than five times as large as those of the corresponding 
genomic DNA-originated sequences, suggesting that 
these OR sequences were highly expressed in the olfactory 
epithelium of the individual used in this study. Conversely, 
22 OR sequences were found only in genomic DNA- 
originated sequences (fig. 3 and supplementary table 6, 
Supplementary Material online), suggesting that these 
ORs were not transcribed. Three of the 22 OR sequences 
were considered pseudogenes (Ac 119, Ac128, and 
Ac131), but remaining 1 9 sequences appeared functional. 
It seems noteworthy that six OR sequences (Ac6, Ac41, 
Ac73, Ac97, Ac103, and Ac118) were found only in 
cDNA-originated sequences but that expression levels 
of these sequences were not clear owing to their low 
coverage numbers. 




Coverage (genome) 

Fig. 3. — Comparison of the coverages or read numbers in FLX- 
based OR-coding sequences originated from the anole lizard genomic 
DNA and nose cDNA. Logarithmic scales are used for both axes, and 
a line stands for an equal level of coverages between the two sources 
after normalization of total OR-coding read numbers between the two 
sources (the slope: 5966/5043 = 1.18). To include OR sequences that 
were not found (i.e.. Ox coverage) in the scatter plot, we added one to 
the read coverage numbers of all sequences. 

The Ratsnake OR Genes 

We obtained 6,202 reads from the ratsnake genomic DNA 
by the FLX-based sequencing approach. The initial assem- 
bling generated 253 contigs, from which 140 contigs re- 
mained after excluding non-OR sequences and OR 
sequences with <5x coverages as well as unifying sequen- 
ces with <1 % sequence divergences (table 2). Due to the 
lack of reference genome sequence for the ratsnake, we 
were unable to specify chimeras in the resultant ratsnake 
OR sequences. We thus automatically removed contigs with 
<1 1 x coverages to minimize chimeric sequences (see the 
reasons outlined earlier). Finally, we identified 96 distinct 
OR sequences from the ratsnake genomic DNA (see table 2 
and supplementary table 5, Supplementary Material online) 
and used them for subsequent phylogenetic analyses. By 
translating them to amino acid sequences, we identified 
eight sequences that contain frameshifts in homopolymer 
runs. We also found three sequences in which coding 
frames were disrupted by nucleotide insertions or deletions 
outside the homopolymer region, although we could not 
judge whether they represent pseudogenes or sequencing 
artifacts. If contigs with 5x-10x coverages were included, 
these numbers considerably elevate (15 sequences with 
homopolymer run-associated errors and 9 disrupted se- 
quences outside the homopolymer region). Thus, ratsnake 
OR sequences with such low(<1 1 x) coverages may include 
a number of indels and/or disrupted stop codons possibly 
originated from PCR/sequencing artifacts (fig. 2Q. 
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Table 3 

Numbers of OR Genes Assigned to Each HORDE Family for the Anole Lizard and the Python DB-Based Sequences and the Ratsnake FLX-Based 
Sequences 





Number of Genes 


Number of Genes 


Number of Genes 


P Value (lizard vs. 


P Value (python vs. 


Family 


(the anole lizard; 


(tne python; 


(the ratsnake) 


ratsnake) 


ratsnake) 


1 


6 


14 


5 


1.000 


1.000 


2 


2 


27 


9 


0.01 1 a 


1.000 


4 


10 


15 


1 


0.054 


0.085 


5 


37 


64 


44 


0.052 


0.004 a 


6 


12 


38 


4 


0.292 


0.021 a 


7 


0 


1 


0 




1.000 


8 


2 


6 


1 


1.000 


0.684 


9 


5 


4 


6 


0.534 


0.024 


10 


21 


45 


8 


0.166 


0.123 


11 


13 


18 


2 


0.032 3 


0.124 


12 


1 


5 


10 


0.001 b 


0.001 b 


13 


3 


6 


2 


1.000 


1.000 


14 


23 


19 


4 


0.007 3 


0.466 


Class l c 


1 


17 


0(1) 


1.000 


0.009 3 


Not identified 


0 


1 


0 






Total 


136 


280 


96 







a Significant in the 5% level by Fisher's exact test. 

b Significant in the 5% level after Bonferroni correction. 

c The class I genes of the anole lizard and the ratsnake correspond to families 51 and 52, respectively. Seventeen class I genes of the python include 4 for family 51, 12 for family 
52, and 1 for family 55. Coverage of the ratsnake class I OR (8x) is lower than the cutoff (1 1 x). 



By conducting the FASTA search against the human OR 
amino acid sequences, each of the ratsnake OR sequences 
was assigned to the HORDE family. The resultant family 
occurrence of the ratsnake ORs was similar to that of 
the anole lizard ORs (fig. 18 and D), although proportions 
in numbers of family member genes were considerably dif- 
ferent between the two species (table 3). Gene proportions 
of families 2 and 1 2 in the ratsnake ORs were significantly 
larger than those in the lizard ORs. Difference in family 12 
was significant in the 5% level even after Bonferroni 
correction (P < 0.0038). Conversely, gene proportions of 
families 1 1 and 14 in the ratsnake ORs were significantly 
smaller than those in the lizard ORs. The family occurrence 
of the ratsnake ORs was also similar to that of the python 
ORs (fig. 1D and £), but proportions in gene numbers of 
a few families were different between these species 
(table 3). Gene proportions of families 5 and 1 2 in ratsnake 
ORs were significantly larger than those in the python ORs. 
Difference in family 1 2 was significant in the 5% level even 
after Bonferroni correction (P < 0.0036). Gene proportions 
of family 4 and class I ORs in ratsnake were significantly 
smaller than those in the python ORs. 

Evolution of the Ratsnake OR Genes 

Figure 4 shows a neighbor joining tree constructed using 
the OR partial sequences from the ratsnake and three 
vertebrate species (the anole lizard, human, and chicken). 
All the 96 ratsnake ORs were widely scattered within the 
class II clade (group y, Niimura and Nei 2005) without form- 
ing large lineage-specific phylogenetic clusters like the y-c 



clade for chicken (Niimura and Nei 2005; Steiger et al. 
2008). Many of these ratsnake ORs were clustered with 
the anole lizard ORs, implying their origination before the 
lineage divergence between these taxa. However, at least 
two small ratsnake-specific clades were found (i.e., clades 
A and B in fig. 4). 

As was the case with the anole lizard, only one class I OR 
sequence was potentially present in the ratsnake (table 3). 
However, this OR sequence had 8x coverage reads, which 
was lower than the tentative cutoff value (<1 1 x). Thus, 
this sequence was not included in the phylogenetic 
tree of figure 4. When included, the ratsnake class I OR 
sequence did not cluster with the anole lizard counterpart 
(data not shown). Blast searches against the HORDE data- 
base indicated that the ratsnake class I OR was assigned to 
family 52, whereas the anole lizard class I OR belonged to 
family 51 (table 3). This implies that the ratsnake class I OR 
has a different origin from the lizard counterpart. 

The phylogenetic tree of OR sequences from the 
ratsnake, the python, and 14 reptilian taxa (fig. 5) showed 
that a majority of the ratsnake ORs were evolutionarily 
close to the other squamate ORs. However, some ratsnake 
ORs were more closely related to turtle ORs than to squa- 
mate ORs, and the others were snake specific (e.g., ten 
genes in clade A). It seems noteworthy that the ratsnake 
had only one OR gene that belonged to the "Squamata- 
specific ORs" clade (Kishida and Hikida 2010). One part 
of the phylogenetic tree was occupied by turtle and croc- 
odilian OR genes without squamate ones, designated the 
Testudine and Crocodilian clade. Another striking feature is 
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Fig. 4. — A neighbor joining tree of 799 OR amino acid partial sequences from four vertebrate species (the ratsnake: 92, the anole lizard: 108, 
human: 387, and chicken: 212). The number of amino acid sites used for the analysis is 77. Bootstrap values of more than 50% are shown on major 
internal nodes only. The tree is rooted at an arbitrary position on a lineage between class I and class II ORs as indicated by an arrow. 



the representation of class I OR group by a number of the 
python OR genes. 

Discussion 

Efficiency of the High-Throughput Approach for 
Characterization of OR Gene Repertoires 

Using a tiny part (—1/30) of the sequencing capacity by a sin- 
gle run of GS FLX Titanium Genome Sequencer, we identi- 
fied many OR partial sequences from the two squamate 



species simultaneously. The broad phylogenetic distribution 
(supplementary fig. 3, Supplementary Material online) and 
the HORDE family composition (fig. 1 B) of FLX-based OR se- 
quences for the anole lizard showed that this approach can 
quickly characterize considerable unbiased proportions of 
OR gene members from multiple sources in parallel. Tagging 
the PCR products from different sources of template DNA is 
a key procedure in this method. However, this approach 
could not recover all 136 members of OR genes identified 
in the anole lizard genome assembly. The apparent recovery 
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Testudine & Crocodilian clade 
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Ratsnake 




Python 
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Squamata 
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Fig. 5. — An unrooted neighbor joining tree of 358 OR amino acid partial sequences from the ratsnake, python, and 14 reptilian species (7 
squamates, 6 testudines, and 1 crocodilian). The number of amino acid sites used for the analysis is 69. GenBank protein IDs and the sources of the OR 
genes of the 14 reptilian taxa were shown in supplementary table 7 (Supplementary Material online). Bootstrap values are shown on representative 
nodes only. 



rate of OR gene sequences was 41 % (56/136 x 100). Two 
A. carolinensis individuals, one used for the draft genome se- 
quencing (Castoe et al. 201 1 ) and the other used for the FLX- 
based sequencing (this study), may have the polymorphism in 
OR gene loci. Thus, the total number of OR genes owned by 
the latter individual is not necessarily 136. However, this low 
recovery rate still implies that many OR genes were not iden- 
tified by the NGS method. Then, how can one pursue higher 
recovery rates? Isn't it realistic to recover nearly complete OR 
gene members by the method? 



In order to gain perspectives into these questions, we 
conducted further analyses of the FLX-based sequences to- 
gether with some manual experiments. First, coverage read 
number for each FLX-based OR sequence was found to be 
rather heterogeneous (fig. 2), suggesting that PCR amplifi- 
cation efficiency of OR sequence varies from gene to gene. 
The partial recovery rate (41 %) is most likely due to differ- 
ences of primer matching to OR gene copies. At an early 
phase of this work, we designed several primers, with which 
A. carolinensis OR gene fragments were amplified, cloned 
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into E. coli, and manually sequenced. This preliminary exper- 
iment showed that the F1 and R1 primers (see table 1 for 
their sequences without tag regions) provided the broadest 
coverage of OR genes, all of which are included in the 56 
genomic DNA-originated OR sequences shown in table 2 
(data not shown). After the NGS experiments, we examined 
the matching of the F1 and R1 primers to 43 NGS-collected 
(either from genomic DNA or cDNA) and 65 NGS- 
uncollected functional OR genes (see supplementary fig. 4, 
Supplementary Material online). Whereas the F1 primer 
appears to have a similar matching to both NGS-collected 
and NGS-uncollected genes, the R1 primer shows a some- 
what reduced matching to NGS-uncollected genes, espe- 
cially at the fourth to seventh positions from the 3 '-end 
of the primer. 

We then designed additional primers (F2, F3, R2, and R3; 
for details, see table 1 ) by referring to sequences of the NGS- 
uncollected OR genes with an expectation that the new 
primer pairs (i.e., F2-R2 and F3-R3) can cover many of 
the NGS-uncollected genes as well as a certain proportion 
of the NGS-collected ones. We manually sequenced 100 
clones having an insert of the amplified products (49 clones 
for F2-R2 and 51 clones for F3-R3). The 49 clones for F2-R2 
provided 20 distinct OR genes covering 1 9 of 1 36 OR genes 
identifiable from the lizard draft genome sequence, plus one 
new OR gene not identified in the draft genome. Seven of 
the 20 genes were not collected by the NGS approach using 
the F1-R1 primers. Similarly, the 51 F3-R3 clones provided 
1 4 distinct OR genes covering 1 0 of the 1 36 OR genes, plus 
4 new OR genes not identified in the draft genome. Six of 
the 14 genes were not collected by the NGS approach using 
the F1-R1 primers. As shown in supplementary figure 3 
(Supplementary Material online), these OR genes collected 
using the F2-R2 and F3-R3 pairs appear to be randomly dis- 
tributed in a phylogenetic tree, as is the case with those col- 
lected using the F1-R1 pair. 

We also considered a possibility that the >5x coverage 
criterion for identifying OR sequences in contigs restricted 
the recovery rate and that more intensive sequencing by 
the NGS approach could pick up genes with a low amplifi- 
cation efficiency. We conducted the BlastN search (%ID 
>0.99 and >50 bp; see Materials and Methods for the rea- 
soning) of all 6,196 genomic DNA-originated FLX reads 
(table 2) against the 1 36 OR gene sequences. Sixty-two po- 
tential OR sequences including 52 functional ones (see sup- 
plementary fig. 3, Supplementary Material online) were 
identified by this search. Together with six OR genes not rep- 
resented in the draft genome (see supplementary table 3, 
Supplementary Material online), 68 OR sequences may 
have been collected if we had obtained much more FLX 
reads so that all these sequences satisfied the >5x coverage 
criterion. Taken together, by expanding the read depth and 
combining three different primer pairs for the NGS ap- 
proach, we estimate that as many as 81 A. carolinensis 



OR sequences could be collected. The cDNA-originated 
NGS characterization showed that six additional OR genes 
could be amplified and identified by the F1-R1 primer pair 
(Ac6, Ac41 , Ac73, Ac97, Ac1 03, and Ac1 1 8; supplementary 
table 6, Supplementary Material online). We consider that 
these OR genes can be basically identifiable by the NGS 
approach, though we do not know why they were not rep- 
resented in the genomic DNA-based FLX reads. Thus, the 
identifiable A. carolinensis OR genes by the NGS approach 
can collectively reach 87 (87/136 x 100 = 64%). This is 
still not the level of exhaustive characterization of all OR 
gene members, but we do not deny a possibility that the 
combinatory use of more primer pairs may be able to reach 
this level in future. 

Accuracy of the High-Throughput Approach for 
Characterization of OR Gene Repertoires 

Accuracy of the NGS approach in identifying OR genes was 
assessed by comparing FLX-based OR sequences obtained 
from the anole lizard genomic DNA with the corresponding 
DB-based OR sequences using the FASTA search. In princi- 
ple, two types of errors could be included in the FLX-based 
OR sequences: errors caused by PCR amplification, such as 
nucleotide substitutions and chimera formation, and errors 
caused by the FLX pyrosequencing, such as nucleotide sub- 
stitutions and indels usually associated with homopolymer 
runs (Margulies et al. 2005; Moore et al. 2006). In this study, 
most nucleotide substitutions originated from PCR errors 
seem to be excluded from the resultant OR sequences in 
contigs because each OR sequence is determined at least 
five times and substitution errors were excluded by gener- 
ating consensus sequences. Indeed, 35 of 56 FLX-based OR 
sequences originated from the lizard genomic DNA were 
completely identical with the corresponding DB-based OR 
sequences (see supplementary table 3, Supplementary 
Material online). This indicates the low error rate of nucle- 
otide substitutions in the FLX-based OR sequences. 

On the other hand, chimeras were frequently found in 
the FLX-based OR sequences (fig. 2). As described in Mate- 
rials and Methods, we employed a PCR condition that min- 
imizes PCR errors and chimera formation during PCR: the 
use of PrimeSTAR HS DNA polymerase (Takara Bio) that have 
both high fidelity and high efficiency in amplification, and 
restriction of amplification cycles to 28 (Lenz and Becker 
2008). In spite of this endeavor, formation of some chimeric 
sequences in amplifying multicopy genes seems unavoid- 
able (Saitoh and Chen 2008). Twelve of 15 chimeras 
(80%) were found to have <11x coverage in FLX-based 
OR sequences originated from the lizard genomic DNA 
(fig. 2A), whereas 28 of 30 chimeras (93%) were found 
to have 20x or fewer coverage in cDNA-originated OR se- 
quences (fig. 2fi). Higher frequency of chimeras in the 
cDNA-originated OR sequences is possibly caused by the 
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reverse transcription reaction. One way to eliminate artificial 
chimeras as much as possible is therefore not to use contigs 
with 20 x or fewer coverages. However, this would lead to 
elimination of considerable numbers of true OR sequences 
with 1 1 x-20x coverage. Thus, we decided to set the <1 1 x 
cutoff coverage value for OR sequences originated from the 
ratsnake genomic DNA. 

In previous studies, OR genes in nonmodel vertebrates 
have been PCR amplified, cloned, and sequenced by the tra- 
ditional Sanger sequencing method (Buck and Axel 1991; 
Ngai et al. 1993; Freitag et al. 1995; Kishida et al. 2007; 
Steiger et al. 2008; Hashiguchi and Nishida 2009; Kishida 
and Hikida 2010). However, this traditional approach has 
limitations in extensive identification of OR gene family. 
Moreover, it seems very difficult to avoid errors associated 
with PCR (nucleotide substitutions and chimera formation) 
by sequencing limited numbers of clones manually. The NGS 
approach can potentially overcome some of these problems 
by determining much larger numbers of OR sequences than 
the traditional approach, although some chimeric sequen- 
ces may still exist in low-coverage contigs. In addition, 
the NGS method can handle multiple samples (different 
species and individuals) simultaneously by adding tags to 
PCR primers. Finally, the NGS approach does not use a step 
for molecular cloning into bacteria and thus seems freer 
from the cloning bias than the traditional approach. This 
is especially advantageous in comparing numbers of 
cDNA-originated FLX reads to gain insights into gene ex- 
pression. Taken together, the current approach using the 
NGS power seems promising toward the efficient and accu- 
rate characterization of multigene family genes and their 
transcripts in nonmodel organisms in future. 

Expression of the Anole Lizard OR Genes 

In the present study, we identified 40 different OR sequen- 
ces from the nose cDNA of the anole lizard. It is likely that 
these OR sequences represent a highly expressed set of OR 
gene copies in the lizard's olfactory epithelium. In addition, 
four OR sequences (Ac13, Ac31, Ac49, and Ac129) were 
suggested to have a notably high level of expression 
(fig. 3). Among them, Ac129 OR sequence identified from 
the genome database appeared to be a pseudogene (see 
supplementary tables 1, 3, and 4, Supplementary Material 
online). However, Ac129 may be a functional gene in an in- 
dividual used for this study because Ac1 29 cDNA-originated 
sequence, at least within the sequenced 331 bp region, 
did not contain any disrupted stop codon or frameshift that 
existed in the Ac129 database sequence. If the Ac129 
database sequence does not include a sequencing error, 
a possible explanation is that Ac129 represents a se- 
gregating pseudogene, where both functional and non- 
functional alleles coexist in a locus. Several segregating 
pseudogenes for ORs have been reported in, for example, 



human (Menashe et al. 2003, 2007), and one of such OR 
loci was suggested to relate to differences of odor sensitivity 
among individuals (Menashe et al. 2007). Investigating 
the Ac129 polymorphism in future may be interesting to 
understand the genetic basis of odor sensitivity in the anole 
lizard. 

Comparison of the coverage read numbers of OR se- 
quences originated from genomic DNA versus cDNA indi- 
cated that 22 lizard OR genes (19 intact and 3 disrupted 
sequences) were not detected for their expression in the 
olfactory epithelium (fig. 3). If this really reflect their lack 
of expression rather than any biases in our experiments 
(e.g., low efficiency in the reverse transcription reaction), 
approximately 39% (22/56 x 100) of the lizard OR genes 
are not expressed in the adult olfactory epithelium. DNA 
microarray studies suggested that proportion of silent 
OR genes that are not expressed in the olfactory epithelium 
is less than 30% in human and mouse (Zhang et al. 2004, 
2007). Iguanian lizards including the green anole lizard 
have highly developed the visual sense, and most iguanians 
may not be dependent on a well-developed olfactory 
system (Zug et al. 2001). The relatively low proportion 
of expressed lizard OR genes may reflect the reduced role 
of olfaction in this species. Verification of this speculation 
should await more rigorous comparison of expressed OR 
gene repertoire in diverse groups of squamates. 

Diversity and Evolution of the Squamate OR Genes 

Using the NGS approach, we identified 96 distinct OR gene 
sequences from the ratsnake genomic DNA. By applying the 
same criterion (e.g., exclusion of contigs with <1 1 x cover- 
ages without the chimeric test), 47 OR gene sequences were 
identifiable from the anole lizard genomic DNA (see supple- 
mentary table 3, Supplementary Material online). Under 
a simple equal recovery rate assumption (47/136 x 100 = 
35%) for the ratsnake, total number of OR gene loci in this 
species was roughly estimated to be 278 (96/47 x 136). 
When the coverage cutoff criterion is changed from 
<1 1 x to <5x, a similar number of the OR gene loci was 
estimated for the ratsnake (140/71 x 136 = 268). 

These estimates are close to the estimated number of OR 
genes in the Burmese python (280; table 3) for which the 
draft genome sequence is newly available (Castoe et al. 
201 1). Thus, the OR gene numbers in these snakes seem 
larger than those in the anole lizard (1 36, table 3) and zebra 
fish (1 76, Niimura 2009) but much smaller than those in frog 
(1,638, Niimura 2009) and human (802, Niimura and Nei 
2005), even smaller than those in chicken (433) and zebra 
finch (553) (Niimura 2009; Steiger et al. 2009). Thus, the 
snakes may have less diverse OR gene repertoire than most 
nonsquamate tetrapods. This may sound somewhat unex- 
pected because most colubrid and pythonid snakes are 
known to have an acute sense of smell (Zug et al. 2001; 
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Fig. 6. — Schematic illustration of the evolution of vertebrate class I (a and ji) and class II (y) OR genes. In the phylogenetic tree, branches shown in 
thick lines indicate the squamate lineage. 



Pianka and Vitt 2003; Vitt et al. 2003). However, many squa- 
mate reptiles have highly developed the vomeronasal olfac- 
tory system, and their sharpness in olfaction is dependent on 
both nasal and vomeronasal receptors (Zug et al. 2001; 
Pianka and Vitt 2003; Vitt et al. 2003). One intriguing 
possibility is that these snakes took a strategy to diversify 
vomeronasal receptor genes (i.e., V1Rs and V2Rs) rather 
than nasally expressed OR genes. 

Phylogenetic analyses showed that OR sequences of the 
anole lizard and the two snakes are broadly distributed in 
the phylogenetic tree (figs. 4 and 5). This implies that these 
squamate lineages have kept a number of OR subfamilies 
that originated before the divergence of mammalian and 
reptilian/avian lineages. Although large lineage-specific phy- 
logenetic clusters as seen in avian species (e.g., chicken y-c 
clade) were not found for the squamates, there was a small 
snake-specific OR clade (clade A: see figs. 4 and 5). Clade B 
was also specific to the ratsnake and other squamates (figs. 4 
and 5). OR gene members in these specific clades may have 
recently increased by gene duplications for adaptation to 
squamate-specific odor environments. 

Figure 6 outlines the evolution of squamate OR genes. As 
reviewed in the literature (Niimura and Nei 2005; Niimura 
2009), the repertoire of class I (a) and class II (y) OR genes 
was expanded when amphibian ancestors emerged to land. 
However, without the radiation of the y-c clade members, 
birds have relatively small numbers of OR genes (e.g., 94 and 
16 for chicken and zebra finch OR genes outside the y-c 
clade, respectively; Steiger et al. 2009). Except for the y-c 
clade, both squamate and chicken ORs are broadly distrib- 
uted in the phylogenetic tree (fig. 4), suggesting that most 
OR subfamilies in birds and squamates originated before 
their divergence. We thus consider that the OR gene reper- 



toire may have been shrunk for both class I (a and fi) and 
class II (y) genes in the genome of an ancestral sauropsid 
lineage. Resultantly, avian and squamate OR gene repertoire 
may consist of basically small numbers of OR genes. How- 
ever, we cannot strictly exclude the possibility that ORs in 
avian and squamate lineages decreased independently. In 
response to the ecological needs to detect more or less 
smells, individual squamate members probably fluctuated 
the number of OR genes in their genome. Most snakes have 
highly developed sense of smell (Zug et al. 2001), and they 
may have retained somewhat larger numbers of OR genes 
than the anole lizard. 

An unexpected finding in the python OR gene repertoire 
was the presence of 1 7 class I genes that include two group 
fi member (table 3; supplementary table 2, Supplementary 
Material online). The group (1 OR gene was not found in the 
anole lizard OR gene repertoire, and the NGS approach did 
not identify any group fl gene in the ratsnake (table 3; 
supplementary table 5, Supplementary Material online). 
Without assuming a horizontal transfer of the group /? gene 
to the python genome, a reasonable explanation could be 
deletions of the group ji gene in multiple lineages leading to 
the anole lizard, the ratsnake, and even birds (fig. 6). It was 
shown that group ji OR genes in tetrapods are actually 
orthologous to some teleost fish OR genes (Niimura 
2009), implying a possibility that the group ji ORs are used 
to recognize odor chemicals common to aquatic and terres- 
trial vertebrates. Niimura (2009) thus deduced that the 
group ji ORs may detect both volatile and water-soluble 
chemicals, such as alcohol. Repeated loss of the group ji 
OR genes in squamates may be related to the ecological dif- 
ferences among species, such as habitat preferences. The 
Burmese python is known to show water-dependent life 
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style occasionally (The Invasive Species Specialist Group 
2010), and this might be related to the retention of the 
group p OR gene in this species. Characterization of 
the OR gene repertoire in more squamate taxa will clarify 
the evolutionary mechanisms of the group p OR genes more 
fully. Also, we currently do not know why the python has 
a larger number of class I (a. and p) OR genes than the anole 
lizard and the ratsnake (table 3). Any reasonable explanation 
to connect the class I gene variation to ecological features 
may be expected by the further characterization of squa- 
mate OR gene repertoires. 

In the present study, we investigated the anole lizard OR 
genes to evaluate the usefulness of the NGS approach in 
characterizing OR genes in nonmodel vertebrate species. 
The NGS approach should be broadly applicable to efficient 
characterization of vertebrate OR genes and their transcripts 
and therefore promises to expand the scale of future studies 
on vertebrate olfactory systems. For example, comparative 
analyses of OR gene transcripts between male and female 
lizards by the NGS approach may provide a clue to under- 
standing molecular basis of olfactory recognition of conspe- 
cific individuals in mating. 

Supplementary Material 

Supplementary figures 1-4, tables 1-7, and data 1 and 2 are 
available at Genome Biology and Evolution online (http:// 
www.gbe.oxfordjournals.org/). 
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